Search CORE

68 research outputs found

Improving Small Object Proposals for Company Logo Detection

Author: Bell S.
Eggert C.
Ioffe S.
Oliveira G.
Redmon J.
Simonyan K.
Zitnick L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Many modern approaches for object detection are two-staged pipelines. The first stage identifies regions of interest which are then classified in the second stage. Faster R-CNN is such an approach for object detection which combines both stages into a single pipeline. In this paper we apply Faster R-CNN to the task of company logo detection. Motivated by its weak performance on small object instances, we examine in detail both the proposal and the classification stage with respect to a wide range of object sizes. We investigate the influence of feature map resolution on the performance of those stages. Based on theoretical considerations, we introduce an improved scheme for generating anchor proposals and propose a modification to Faster R-CNN which leverages higher-resolution feature maps for small objects. We evaluate our approach on the FlickrLogos dataset improving the RPN performance from 0.52 to 0.71 (MABO) and the detection performance from 0.52 to 0.67 (mAP).Comment: 8 Pages, ICMR 201

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Conditional Image-Text Embedding Networks

Author: A Gordo
A Rohrbach
BA Plummer
CL Zitnick
F Radenović
L Yu
M Wang
R Krishna
Publication venue
Publication date: 28/07/2018
Field of study

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline.Comment: ECCV 2018 accepted pape

arXiv.org e-Print Archive

Crossref

Post-Turing Methodology: Breaking the Wall on the Way to Artificial General Intelligence

Author: A Alekseev
A Pease
A Turing
CL Zitnick
D Dennett
G Marcus
H Ishiguro
H Kitano
L Birglen
L Morgenstern
P Clark
S Baum
S Chakraborty
S Harnad
SS Adams
Vincent C Müller
W Jarrold
Publication venue
Publication date: 01/01/2020
Field of study

This article offers comprehensive criticism of the Turing test and develops quality criteria for new artificial general intelligence (AGI) assessment tests. It is shown that the prerequisites A. Turing drew upon when reducing personality and human consciousness to “suitable branches of thought” re-flected the engineering level of his time. In fact, the Turing “imitation game” employed only symbolic communication and ignored the physical world. This paper suggests that by restricting thinking ability to symbolic systems alone Turing unknowingly constructed “the wall” that excludes any possi-bility of transition from a complex observable phenomenon to an abstract image or concept. It is, therefore, sensible to factor in new requirements for AI (artificial intelligence) maturity assessment when approaching the Tu-ring test. Such AI must support all forms of communication with a human being, and it should be able to comprehend abstract images and specify con-cepts as well as participate in social practices

PhilPapers

Crossref

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Author: A Risnumawan
B Shi
C Yao
CL Zitnick
E Shelhamer
Jifeng Dai
L Gómez
L Neumann
Lukas Neumann
M Jaderberg
M Jaderberg
M Liao
S Hochreiter
S Ren
W Liu
Weilin Huang
Y Zhu
Y Zhu
Z Tian
Publication venue
Publication date: 01/08/2018
Field of study

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by the newly published work Mask R-CNN. Different from previous methods that also accomplish text spotting with end-to-end trainable deep neural networks, Mask TextSpotter takes advantage of simple and smooth end-to-end learning procedure, in which precise text detection and recognition are acquired via semantic segmentation. Moreover, it is superior to previous methods in handling text instances of irregular shapes, for example, curved text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the proposed method achieves state-of-the-art results in both scene text detection and end-to-end text recognition tasks.Comment: To appear in ECCV 201

arXiv.org e-Print Archive

Crossref

Fuzzy Free Path Detection from Disparity Maps by Using Least-Squares Fitting to a Plane

Author: A Wedel
A Wedel
C Capelle
C Lawrence Zitnick
C Leung
CC Lee
CL Chen
D Dubois
D Murray
D Scharstein
E Grosso
E Sampaio
H Hirschmuller
HJ Sun
J Moreno-Garcia
JC Fodor
L Cai
L Nalpantidis
M Vergauwen
N Ortigosa
N Ortigosa
NB Hui
Nuria Ortigosa
P Bach y Rita
PF Felzenszwalb
Q Yang
S Birchfield
S Gehrig
Samuel Morillas
SW Lee
TH Nguyen
UR Dhond
Y Huang
Z Sun
ZY Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2014
Field of study

A method to detect obstacle-free paths in real-time which works as part of a cognitive navigation aid system for visually impaired people is proposed. It is based on the analysis of disparity maps obtained from a stereo vision system which is carried by the blind user. The presented detection method consists of a fuzzy logic system that assigns a certainty to be part of a free path to each group of pixels, depending on the parameters of a planar-model fitting. We also present experimental results on different real outdoor scenarios showing that our method is the most reliable in the sense that it minimizes the false positives rate.N. Ortigosa acknowledges the support of Universidad Politecnica de Valencia under grant FPI-UPV 2008 and Spanish Ministry of Science and Innovation under grant MTM2010-15200. S. Morillas acknowledges the support of Universidad Politecnica de Valencia under grant PAID-05-12-SP20120696.Ortigosa Araque, N.; Morillas Gómez, S. (2014). Fuzzy Free Path Detection from Disparity Maps by Using Least-Squares Fitting to a Plane. Journal of Intelligent and Robotic Systems. 75(2):313-330. https://doi.org/10.1007/s10846-013-9997-1S313330752Cai, L., He, L., Xu, Y., Zhao, Y., Yang, X.: Multi-object detection and tracking by stereovision. Pattern Recognit. 43(12), 4028–4041 (2010)Hikosaka, N., Watanabe, K., Umeda, K.: Obstacle detection of a humanoid on a plane using a relative disparity map obtained by a small range image sensor. In: Proceedings of the IEEE International Conference on Robotics and Automation, vol. 1, pp. 3048–3053 (2007)Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Fast stixel computation for fast pedestrian detection. In: ECCV, CVVT workshop, October (2012)Huang, Y., Fu, S., Thompson, C.: Stereovision-based object segmentation for automotive applications. EURASIP J. Appl. Signal Process. 2005(14), 2322–2329 (2005)Duan, B.B., Liu, W., Fu, P.Y., Yang, C.Y., Wen, X.Z., Yuan, H.: Real-time on-road vehicle and motorcycle detection using a single camera. In: IEEE International Conference on Industrial Technology, pp. 579–584. IEEE (2009)Oliveira L, Nunes, U.: On integration of features and classifiers for robust vehicle detection. In: IEEE International Conference on Intelligent Transportation Systems, pp. 414–419. IEEE (2008)Sun, Z., Bebis, G., Miller, R.: On-road vehicle detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 694–711 (2006)Sun, H.J., Yang, J.Y.: Obstacle detection for mobile vehicle using neural network and fuzzy logic. Neural Netw. Distrib. Process. 4555(1), 99–104 (2001)Hui, N.B., Pratihar, D.K.: Soft computing-based navigation schemes for a real wheeled robot moving among static obstacles. J. Intell. Robot. Syst. 51(3), 333–368 (2008)Menon, A., Akmeliawati, R., Demidenko, S.: Towards a simple mobile robot with obstacle avoidance and target seeking capabilities using fuzzy logic. In: Proceedings IEEE Instrumentation and Measurement Technology Conference, vol. 1–5, pp. 1003–1008 (2008)Moreno-Garcia, J., Rodriguez-Benitez, L., Fernandez-Caballero, A., Lopez, M.T.: Video sequence motion tracking by fuzzification techniques. Appl. Soft Comput. 10(1), 318–331 (2010)Nguyen, T.H., Nguyen, J.S., Pham, D.M., Nguyen, H.T.: Real-time obstacle detection for an autonomous wheelchair using stereoscopic cameras. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2007(1), 4775–4778 (2007)Nguyen, J.S., Nguyen, T.H., Nguyen, H.T.: Semi-autonomous wheelchair system using stereoscopic cameras. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 1–20, pp. 5068–5071 (2009)Grosso, E., Tistarelli, M.: Active/dynamic stereo vision. IEEE Trans. Pattern Anal. Mach. Intell. 17(9), 868–879 (1995)Kubota, S., Nakano, T., Okamoto, Y.: A global optimization for real-time on-board stereo obstacle detection systems. In: IEEE Intelligent Vehicles Symposium, pp. 7–12. IEEE (2007)Ortigosa, N., Morillas, S., Peris-Fajarnés, G., Dunai, L.: Fuzzy free path detection based on dense disparity maps obtained from stereo cameras. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 20(2), 245–259 (2012)Murray, D., Little, J.J.: Using real-time stereo vision for mobile robot navigation. Auton. Robot. 8(2), 161–171 (2000)Badino, H., Mester, R., Vaudrey, T., Franke, U.: Stereo-based free space computation in complex traffic scenarios. In: IEEE Southwest Symposium on Image Analysis & Interpretation, pp. 189–192 (2008)Hoilund, C., Moeslund, T.B., Madsen, C.L., Trivedi, M.M.: Free space computation from stochastic occupancy grids based on iconic kalman filtered disparity maps. In: Proceedings International Conference on Computer Vision Theory and Applications, vol. 1, pp. 164–167 (2010)Franke, U., Joos, A.: Real-time stereo vision for urban traffic scene understanding. In: IEEE Intelligent Vehicles Symposium, pp. 273–278. IEEE (2000)Wedel, A., Badino, H., Rabe, C., Loose, H., Franke, U., Cremers, D.: B-spline modeling of road surfaces with an application to free-space estimation. IEEE Trans. Intell. Transp. Syst. 10(4), 572–583 (2009)Vergauwen, M., Pollefeys, M., Van Gool, L.: A stereo-vision system for support of planetary surface exploration. Mach. Vis. Appl. 14(1), 5–14 (2003)Tarel, J.P., Leng, S.S., Charbonnier, P.: Accurate and robust image alignment for road profile reconstruction. In: IEEE International Conference on Image Processing, pp. 365–368. IEEE (2007)Kostavelis, I., Gasteratos, A.: Stereovision-based algorithm for obstacle avoidance. In: Lecture Notes in Computer Science, pp. 195–204. Intelligent Robotics and Applications (2009)Cerri, P., Grisleri, P.: Free space detection on highways using time correlation between stabilized sub-pixel precision ipm images. In: IEEE International Conference on Robotics and Automation, pp. 2223–2228. IEEE (2005)Labayrade, R., Aubert, D., Tarel, J.P.: Real time obstacle detection in stereo vision on non-flat road geometry through v-disparity representation. In: IEEE Intelligent Vehicle Symposium, pp. 646–651. INRIA (2002)Ortigosa, N., Morillas, S., Peris-Fajarnés, G., Dunai, L.: Disparity maps for free path detection. In: Proceedings International Conference on Computer Vision Theory and Applications, vol. 1, pp. 310–315 (2010)Ortigosa, N., Morillas, S., Peris-Fajarnés, G.: Obstacle-free pathway detection by means of depth maps. J. Intell. Robot. Syst. 63(1), 115–129 (2011)http://www.casblip.comBach y Rita, P., Collins, C., Sauders, B., White, B., Scadden, L.: Vision substitution by tactile image projection. Nature 221, 963964 (1969)Sampaio, E., Maris, S., Bach y Rita, P.: Brain plasticity: visual acuity of blind persons via the tongue. Brain Res. 908, 204207 (2001)http://www.seeingwithsound.comCapelle, C., Trullemans, C., Arno, P., Veraart, C.: A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution. IEEE Trans. Biomed. Eng. 45, 12791293 (1998)Lee, S.W., Kang, S.K., Lee, S.A.: A walking guidance system for the visually impaired. Int. J. Pattern Recognit. 22, 11711186 (2008)Chen, C.L., Liao, Y.F., Tai, C.L.: Image-to-midi mapping based on dynamic fuzzy color segmentation for visually impaired people. Pattern Recognit. Lett. 32, 549–560 (2011)Lombardi, P., Zanin, M., Messelodi, S.: Unified stereovision for ground, road, and obstacle detection. In: Proceedings on the Intelligent Vehicles Symposium, 2005, pp. 783–788. IEEE (2005)Yu, Q., Araujo, H., Wang, H.: Stereo-vision based real time obstacle detection for urban environments. In: Proceedings on the International Conference of Advanced Robotics, vol. 1, pp. 1671–1676 (2003)Benenson, R., Timofte, R., Van Gool, L.: Stixels estimation without depth map computation. In: ICCV, CVVT workshop (2011)Li, X., Yao, X., Murphey, Y.L., Karlsen, R., Gerhart, G.: A real-time vehicle detection and tracking system in outdoor traffic scenes. In: Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2, pp. 761–764 (2004)Zhang, Z.Y.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)Dhond, U.R., Aggarwal, J.K.: Structure from stereo: a review. IEEE Trans. Syst. Man Cybern. 19, 1489–1510 (1989)Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1/2/3), 7–42 (2002)Middlebury Stereo Vision Page. http://vision.middlebury.edu/stereo/Birchfield, S., Tomasi, C.: Depth discontinuities by pixel-to-pixel stereo. Int. J. Comput. Vis. 17(3), 269–293 (1999)Lawrence Zitnick, C., Bing Kang, S.: Stereo for image-based rendering using image over-segmentation. Int. J. Comput. Vis. 75(1), 49–65 (2007)Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. Comput. Vis. 70(1), 41–54 (2006)Yang, Q., Wang, L., Yang, R., Stewnius, H., Nistr, D.: Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 492–504 (2009)Gehrig, S., Eberli, F., Meyer, T.: A real-time low-power stereo vision engine using semi-global matching. Lect. Notes Comput. Sci. 5815/2009, 134–143 (2009)Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., Cremers, D.: Stereoscopic scene flow computation for 3d motion understanding. Int. J. Comput. Vis. 95, 29–51 (2011)Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)Leung, C., Appleton, B., Sun, C.: Iterated dynamic programming and quadtree subregioning for fast stereo matching. Image Vis. Comput. 26(10), 1371–1383 (2008)Hartley, R.I., Zisserman, A.: Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, ISBN: 0521540518 (2004)Spiegel, M.R., Stepthens, L.J.: Statistics, 4th edn. Mc Graw Hill (2008)Kerre, E.E.: Fuzzy sets and approximate reasoning. Xian Jiaotong University Press (1998)Dubois, D., Prade, H.: Fuzzy sets and systems: theory and applications. Academic Press, New York (1980)Lee, C.C.: Fuzzy logic in control systems: Fuzzy logic controller-parts 1 and 2. IEEE Trans. Syst. Man Cybern. 20(2), 404–435 (1990)Fodor, J.C.: A new look at fuzzy-connectives. Fuzzy Sets Syst. 57(2), 141–148 (1993)Nalpantidis, L., Gasteratos, A.: Stereo vision for robotic applications in the presence of non-ideal lightning conditions. Image Vis. Comput. 28(6), 940–951 (2010

Crossref

RiuNet

A stereoscopic video conversion scheme based on spatio-temporal analysis of MPEG videos

Author: AM Tekalp
B Mendiburu
C Fehn
C Fehn
CL Zitnick
CM Cheng
D Kim
G Guo
GS Lin
GS Lin
H Murata
HM Wang
HR Wu
HT Quan
I Ideses
I Ideses
J Ko
J Kopf
JC Chiang
KJ Oh
L Zhang
LJ Angot
M Kim
M Pourazad
MC Lee
MH Pinson
MJ Young
P Harman
RC Gonzalez
T Okino
WN Lie
Y Feng
YP Tan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The Art of Questioning Lethal Vision: Mosse’s Infra and Militarized Machine Vision

Crossref

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking

Author: D Gunning
L Zitnick
M Jaderberg
P Shivaswamy
S Ren
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Accelerating Depth Image-Based Rendering Using GPU

Author: C. Zitnick
L. Levkovich-Maslyuk
R. Pajarola
R. Rost
W. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Abstract. In this paper, we propose a practical method for hardware-accelerated rendering of the depth image-based representation (DIBR) object, which is defined in MPEG-4 Animation Framework eXtension (AFX). The proposed method overcomes the drawbacks of the conventional rendering, i.e. it is slow since it is hardly assisted by graphics hardware and surface lighting is static. Utilizing the new features of modern graphic processing unit (GPU) and programmable shader support, we develop an efficient hardware-accelerated rendering algorithm of depth image-based 3D object. Surface rendering in re-sponse of varying illumination is performed inside the vertex shader while adaptive point splatting is performed inside the fragment shader. Experimental results show that the rendering speed increases considerably compared with the software-based rendering and the conventional OpenGL-based rendering method.

CiteSeerX

Crossref